OOV Word Detection using Hybrid Models with Mixed Types of Fragments
نویسندگان
چکیده
This paper presents initial studies to improve the out-ofvocabulary (OOV) word detection performance by using mixed types of fragment units in one hybrid system. Three types of fragment units, subwords, syllables, and graphones, were combined in two different ways to build the hybrid lexicon and language model. The experimental results show that hybrid systems with mixed types of fragment units perform better than hybrid systems using only one type of fragment unit. After comparing the OOV word detection performance with the number and length of fragment units of each system, we proposed future work to better utilize mixed types of fragment units in a hybrid system.
منابع مشابه
OOV Detection and Recovery Using Hybrid Models with Different Fragments
In this paper, we address the out-of-vocabulary (OOV) detection and recovery problem by developing three different fragmentword hybrid systems. A fragment language model (LM) and a word LM were trained separately and then combined into a single hybrid LM. Using this hybrid model, the recognizer can recognize any OOVs as fragment sequences. Different types of fragments, such as phones, subwords,...
متن کاملOOV-detection in large vocabulary system using automatically defined word-fragments as fillers
The problem of unknown words has been addressed using automatically generated ller fragments which augment the lexicon and are incorporated in the language model. These fragments are used to reduce the damage on in-vocabulary words, to detect OOV regions and to provide a phonetic transcription for these regions. The performance of this technique has been evaluated in terms of damage reduction e...
متن کاملInvestigation on language modelling approaches for open vocabulary speech recognition
By definition, words that are not present in a recognition vocabulary are called out-of-vocabulary (OOV) words. Recognition of unseen or new words is an important feature that is always desired in any real-world large vocabulary continuous speech recognition (LVCSR) system. However, human languages are complex in nature due to wide varieties of morphological richness such as inflections, deriva...
متن کاملTHE JOHNS HOPKINS UNIVERSITY Sub-Lexical and Contextual Modeling of Out-of-Vocabulary Words in Speech Recognition
Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of subword units. We present a novel probabilistic model to l...
متن کاملLearning Out-of-Vocabulary Words in Automatic Speech Recognition
Out-of-vocabulary (OOV) words are unknown words that appear in the testing speech but not in the recognition vocabulary. They are usually important content words such as names and locations which contain information crucial to the success of many speech recognition tasks. However, most speech recognition systems are closed-vocabulary recognizers that only recognize words in a fixed finite vocab...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012